Semantic Based Text Classification Using WordNets: Indian Language Perspective

نویسندگان

  • S. Mohanty
  • Sabyasachi Swain
چکیده

Automatic text classification is an area that has received a great deal of attention in recent research due to current growth of Internet, which has resulted in huge amount of information that has become a challenge to access efficiently. This paper describes an experimental result on how to create an automatic efficient and effective tool that is able to classify large documents quickly. Our method is built on lexical chain of linking significant words that are about a particular topic with the help of hypernym relation in WordNet. We have tested for the Indian language Sanskrit using SanskritNet and also extracting and scoring lexical chain considering with necessary design decisions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Complex Predicates in Indian Language Wordnets

Wordnets, which are repositories of lexical semantic knowledge containing semantically linked synsets and lexically linked words, are indispensable for work on computational linguistics and natural language processing. While building wordnets for Hindi and Marathi, two major IndoEuropean languages, we observed that the verb hierarchy in the Princeton Wordnet was rather shallow. We set to constr...

متن کامل

Cross-Lingual Sentiment Analysis for Indian Languages using Linked WordNets

Cross-Lingual Sentiment Analysis (CLSA) is the task of predicting the polarity of the opinion expressed in a text in a language Ltest using a classifier trained on the corpus of another language Lt rain. Popular approaches use Machine Translation (MT) to convert the test document in Ltest to Lt rain and use the classifier of Lt rain. However, MT systems do not exist for most pairs of languages ...

متن کامل

The Use of WordNets for Multilingual Text Categorization: A Comparative Study

The successful use of the Princeton WordNet for Text Categorization has prompted the creation of similar WordNets in other languages as well. This paper focuses on a comparative study between two WordNet based approaches for Multilingual Text Categorization. The first relates on using machine translation to access directly the princeton WordNet while the second avoids machine translation by usi...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Semantic Prosody: Its Knowledge and Appropriate Selection of Equivalents

In translation, choosing appropriate equivalent is essential to convey the right message from source-text to target-text, and one of the issues that may have a determinative role in appropriate equivalent choice is the semantic prosody (SP) behavior of words and the relation existing between the SP of a word and semantic senses (i.e. negativity, positivity or neutrality) of its collocations in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006